{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Neural Networks\n", "\n", "Neural networks are a way of parametrizing non-linear functions. On a very basic level, they are formed by a composition of non-linear function. The functions is defined with a layered architecture. The mapping from the input layer to the output layer is performed via hidden layers. Each layer $k$ produces an output $z_k$ that is a non-linear function of a weighted combination of the outputs of the previous layer, $z_k = g_k(W_k z_{k-1})$. \n", "\n", "Once the architecture and the activation functions $g_k(\\cdot)$ are defined, the weights $W_k$ are trained. If all the functions $g_k$ are (sub)-differentiable then, via the chain rule, gradients exist and can be computed. The weights are trained via different variants of gradient descent. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import numpy as np \n", "import matplotlib as mpl \n", "import matplotlib.pyplot as plt \n", "\n", "from sklearn import cluster, datasets, mixture\n", "from sklearn.preprocessing import StandardScaler, MinMaxScaler\n", "from sklearn.neural_network import MLPClassifier, MLPRegressor\n", "from sklearn.model_selection import train_test_split\n", "\n", "import ipywidgets\n", "from ipywidgets import interact, interactive, interact_manual, fixed\n", "\n", "from utilities import plot_helpers\n", "\n", "\n", "%matplotlib inline\n", "%load_ext autoreload\n", "%autoreload 2\n", "from matplotlib import rcParams\n", "rcParams['figure.figsize'] = (10, 5) # Change this if figures look ugly. \n", "rcParams['font.size'] = 16\n", "\n", "import warnings\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Classification Demo\n", "\n", "Neural network training has a lot of hyperparameters. Architecture, learning rate, batch size, optimization algorithm, random seed are just a few of them. Because of non-convexity, " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "n_samples = 200\n", "\n", "rcParams['figure.figsize'] = (10, 5) # Change this if figures look ugly. \n", "rcParams['font.size'] = 16\n", "def mlp(dataset, hidden_layer_sizes, activation, solver, reg, noise):\n", " np.random.seed(42)\n", " classifier = MLPClassifier(hidden_layer_sizes=hidden_layer_sizes, \n", " activation=activation,\n", " solver=solver,\n", "# max_iter=n_iter, \n", " alpha=np.power(10., reg),\n", "# verbose=10, \n", "# tol=1e-4, \n", " random_state=1,\n", " learning_rate_init=.1)\n", "\n", " if dataset is 'blobs':\n", " X, Y = datasets.make_blobs(n_samples=n_samples, centers=2, random_state=3, cluster_std=10*noise)\n", " elif dataset is 'circles':\n", " X, Y = datasets.make_circles(n_samples=n_samples, factor=.5, noise=noise, random_state=42)\n", " elif dataset is 'moons':\n", " X, Y = datasets.make_moons(n_samples=n_samples, noise=noise, random_state=42)\n", " elif dataset == 'xor':\n", " np.random.seed(42)\n", " step = int(n_samples/4)\n", " \n", " X = np.zeros((n_samples, 2))\n", " Y = np.zeros(n_samples)\n", " \n", " X[0*step:1*step, :] = noise * np.random.randn(step, 2)\n", " Y[0*step:1*step] = 1\n", " X[1*step:2*step, :] = np.array([1, 1]) + noise * np.random.randn(step, 2)\n", " Y[1*step:2*step] = 1\n", " \n", " X[2*step:3*step, :] = np.array([0, 1]) + noise * np.random.randn(step, 2)\n", " Y[2*step:3*step] = -1\n", " X[3*step:4*step, :] = np.array([1, 0]) + noise * np.random.randn(step, 2)\n", " Y[3*step:4*step] = -1\n", " \n", " elif dataset == 'periodic':\n", " \n", " step = int(n_samples/4)\n", " \n", " X = np.zeros((n_samples, 2))\n", " Y = np.zeros(n_samples)\n", " \n", " X[0*step:1*step, :] = noise * np.random.randn(step, 2)\n", " Y[0*step:1*step] = 1\n", " X[1*step:2*step, :] = np.array([0, 2]) + noise * np.random.randn(step, 2)\n", " Y[1*step:2*step] = 1\n", " \n", " X[2*step:3*step, :] = np.array([0, 1]) + noise * np.random.randn(step, 2)\n", " Y[2*step:3*step] = -1\n", " X[3*step:4*step, :] = np.array([0, 3]) + noise * np.random.randn(step, 2)\n", " Y[3*step:4*step] = -1\n", " \n", " X = X[Y <= 1, :]\n", " Y = Y[Y <=1 ]\n", " Y[Y==0] = -1\n", " \n", " X = StandardScaler().fit_transform(X)\n", " X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=.4)\n", " \n", " classifier.fit(X_train, y_train)\n", " print(classifier.score(X_test, y_test))\n", " \n", " \n", " # plot the line, the points, and the nearest vectors to the plane\n", " plt.figure()\n", " plt.clf()\n", " fig = plt.axes()\n", " opt = {'marker': 'r*', 'label': '+'}\n", " plot_helpers.plot_data(X[np.where(Y == 1)[0], 0], X[np.where(Y == 1)[0], 1], fig=fig, options=opt)\n", " opt = {'marker': 'bs', 'label': '-'}\n", " plot_helpers.plot_data(X[np.where(Y == -1)[0], 0], X[np.where(Y == -1)[0], 1], fig=fig, options=opt)\n", "\n", " mins = np.min(X, 0)\n", " maxs = np.max(X, 0)\n", " x_min = mins[0] - 1\n", " x_max = maxs[0] + 1\n", " y_min = mins[1] - 1\n", " y_max = maxs[1] + 1\n", "\n", " XX, YY = np.mgrid[x_min:x_max:200j, y_min:y_max:200j] \n", " Xplot = np.c_[XX.ravel(), YY.ravel()]\n", " if hasattr(classifier, \"decision_function\"):\n", " Z = classifier.decision_function(Xplot)\n", " else:\n", " Z = classifier.predict_proba(Xplot)[:, 1]\n", " \n", " # Put the result into a color plot\n", " Z = Z.reshape(XX.shape)\n", " # plt.figure(fignum, figsize=(4, 3))\n", " # Put the result into a color plot\n", " plt.contourf(XX, YY, Z, cmap=plt.cm.jet, alpha=.3)\n", " \n", " \n", "interact(mlp, \n", " dataset=['blobs', 'circles', 'moons', 'xor', 'periodic'],\n", " activation=['logistic', 'relu', 'identity', 'tanh'],\n", " solver=['sgd', 'adam','lbfgs'],\n", " hidden_layer_sizes=[(50, ), (100, ), (50, 50), (100, 100), (50, 50, 50), (100, 100, 100)],\n", " reg=ipywidgets.FloatSlider(value=-3,\n", " min=-3,\n", " max=3,\n", " step=0.1,\n", " readout_format='.1f',\n", " description='reg 10^:',\n", " style={'description_width': 'initial'},\n", " continuous_update=False),\n", " noise=ipywidgets.FloatSlider(value=0.05,\n", " min=0.01,\n", " max=0.3,\n", " step=0.01,\n", " readout_format='.2f',\n", " description='noise:',\n", " style={'description_width': 'initial'},\n", " continuous_update=False), \n", " );" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Keras Demo" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import keras\n", "from keras.models import Sequential\n", "from keras.layers import Dense, Dropout, Activation, Flatten, BatchNormalization\n", "from keras.layers import Conv2D, MaxPooling2D\n", "from keras.datasets import mnist\n", "from keras import backend as K\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "np.random.seed(123) # for reproducibility\n", "\n", "batch_size = 128\n", "num_classes = 10\n", "epochs = 6\n", "\n", "# input image dimensions\n", "img_rows, img_cols = 28, 28\n", "\n", "# 1. Load pre-shuffled MNIST data into train and test sets\n", "(x_train, y_train), (x_test, y_test) = mnist.load_data()\n", "\n", "if K.image_data_format() == 'channels_first':\n", " input_shape = (1, img_rows, img_cols)\n", "else:\n", " input_shape = (img_rows, img_cols, 1)\n", " \n", "# 3. Preprocess class labels\n", "y_train = keras.utils.to_categorical(y_train, num_classes)\n", "y_test = keras.utils.to_categorical(y_test, num_classes)\n", "\n", "# 4. Define model architecture\n", "\n", "ANN = Sequential()\n", "ANN.name = 'ANN'\n", "ANN.add(Dense(512, activation='relu', input_shape=(784,)))\n", "# ANN.add(BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', moving_mean_initializer='zeros', moving_variance_initializer='ones', beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None))\n", "# ANN.add(Dropout(0.2))\n", "ANN.add(Dense(512, activation='relu'))\n", "# ANN.add(BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', moving_mean_initializer='zeros', moving_variance_initializer='ones', beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None))\n", "# ANN.add(Dropout(0.2))\n", "ANN.add(Dense(num_classes, activation='softmax'))\n", "\n", "model = ANN\n", "\n", "x_train = x_train.reshape(x_train.shape[0], 784)\n", "x_test = x_test.reshape(x_test.shape[0], 784)\n", " \n", "x_train = x_train.astype('float32')\n", "x_test = x_test.astype('float32')\n", "x_train /= 255\n", "x_test /= 255\n", "print('x_train shape:', x_train.shape)\n", "print(x_train.shape[0], 'train samples')\n", "print(x_test.shape[0], 'test samples')\n", "\n", "\n", "model.summary()\n", "model.compile(loss='categorical_crossentropy',\n", " optimizer=keras.optimizers.Adadelta(),\n", " metrics=['accuracy'])\n", "\n", "try:\n", " history = model.fit(x_train, y_train,\n", " batch_size=batch_size,\n", " epochs=epochs,\n", " verbose=1,\n", " validation_data=(x_test, y_test))\n", "except KeyboardInterrupt:\n", " pass\n", "score = model.evaluate(x_test, y_test, verbose=0)\n", "print('')\n", "print('Test loss:', score[0])\n", "print('Test accuracy:', score[1])\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "np.random.seed(123) # for reproducibility\n", "\n", "batch_size = 128\n", "num_classes = 10\n", "epochs = 6\n", "\n", "# input image dimensions\n", "img_rows, img_cols = 28, 28\n", "\n", "# 1. Load pre-shuffled MNIST data into train and test sets\n", "(x_train, y_train), (x_test, y_test) = mnist.load_data()\n", "\n", "if K.image_data_format() == 'channels_first':\n", " input_shape = (1, img_rows, img_cols)\n", "else:\n", " input_shape = (img_rows, img_cols, 1)\n", " \n", "# 3. Preprocess class labels\n", "y_train = keras.utils.to_categorical(y_train, num_classes)\n", "y_test = keras.utils.to_categorical(y_test, num_classes)\n", "\n", "# 4. Define model architecture\n", "CNN = Sequential()\n", "CNN.name = 'CNN'\n", "CNN.add(Conv2D(32, kernel_size=(3, 3),\n", " activation='relu',\n", " input_shape=input_shape))\n", "CNN.add(Conv2D(64, (3, 3), activation='relu'))\n", "CNN.add(MaxPooling2D(pool_size=(2, 2)))\n", "CNN.add(Dropout(0.25))\n", "CNN.add(Flatten())\n", "CNN.add(Dense(128, activation='relu'))\n", "CNN.add(Dropout(0.5))\n", "CNN.add(Dense(num_classes, activation='softmax'))\n", "\n", "\n", "ANN = Sequential()\n", "ANN.name = 'ANN'\n", "ANN.add(Dense(512, activation='relu', input_shape=(784,)))\n", "ANN.add(Dropout(0.2))\n", "ANN.add(Dense(512, activation='relu'))\n", "ANN.add(Dropout(0.2))\n", "ANN.add(Dense(num_classes, activation='softmax'))\n", "\n", "models = [CNN]\n", "\n", "for model in models:\n", " # 2. Preprocess input data\n", " if model.name == 'ANN':\n", " x_train = x_train.reshape(x_train.shape[0], 784)\n", " x_test = x_test.reshape(x_test.shape[0], 784)\n", " elif model.name == 'CNN':\n", " x_train = x_train.reshape(x_train.shape[0], *input_shape)\n", " x_test = x_test.reshape(x_test.shape[0], *input_shape)\n", " \n", " x_train = x_train.astype('float32')\n", " x_test = x_test.astype('float32')\n", " x_train /= 255\n", " x_test /= 255\n", " print('x_train shape:', x_train.shape)\n", " print(x_train.shape[0], 'train samples')\n", " print(x_test.shape[0], 'test samples')\n", "\n", "\n", " model.summary()\n", " model.compile(loss='categorical_crossentropy',\n", " optimizer=keras.optimizers.Adadelta(),\n", " metrics=['accuracy'])\n", " try:\n", " history = model.fit(x_train, y_train,\n", " batch_size=batch_size,\n", " epochs=epochs,\n", " verbose=1,\n", " validation_data=(x_test, y_test))\n", " except KeyboardInterrupt:\n", " pass\n", " \n", " score = model.evaluate(x_test, y_test, verbose=0)\n", " print('Test loss:', score[0])\n", " print('Test accuracy:', score[1])\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Stochastic Learning strategies" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# different learning rate schedules and momentum parameters\n", "params = [{'solver': 'sgd', 'learning_rate': 'constant', 'momentum': 0,\n", " 'learning_rate_init': 0.2},\n", " {'solver': 'sgd', 'learning_rate': 'constant', 'momentum': .9,\n", " 'nesterovs_momentum': False, 'learning_rate_init': 0.2},\n", " {'solver': 'sgd', 'learning_rate': 'constant', 'momentum': .9,\n", " 'nesterovs_momentum': True, 'learning_rate_init': 0.2},\n", " {'solver': 'sgd', 'learning_rate': 'invscaling', 'momentum': 0,\n", " 'learning_rate_init': 0.2},\n", " {'solver': 'sgd', 'learning_rate': 'invscaling', 'momentum': .9,\n", " 'nesterovs_momentum': True, 'learning_rate_init': 0.2},\n", " {'solver': 'sgd', 'learning_rate': 'invscaling', 'momentum': .9,\n", " 'nesterovs_momentum': False, 'learning_rate_init': 0.2},\n", " {'solver': 'adam', 'learning_rate_init': 0.01}]\n", "\n", "labels = [\"constant learning-rate\", \"constant with momentum\",\n", " \"constant with Nesterov's momentum\",\n", " \"inv-scaling learning-rate\", \"inv-scaling with momentum\",\n", " \"inv-scaling with Nesterov's momentum\", \"adam\"]\n", "\n", "plot_args = [{'c': 'red', 'linestyle': '-'},\n", " {'c': 'green', 'linestyle': '-'},\n", " {'c': 'blue', 'linestyle': '-'},\n", " {'c': 'red', 'linestyle': '--'},\n", " {'c': 'green', 'linestyle': '--'},\n", " {'c': 'blue', 'linestyle': '--'},\n", " {'c': 'black', 'linestyle': '-'}]\n", "\n", "def plot_on_dataset(dataset):\n", " # Load datasets. \n", " plt.figure()\n", " max_iter = 400\n", " if dataset == \"iris\":\n", " data = datasets.load_iris()\n", " X = data.data\n", " y = data.target\n", " elif dataset == \"digits\":\n", " data = datasets.load_digits()\n", " X = data.data\n", " y = data.target\n", " max_iter = 15\n", " elif dataset == \"circles\":\n", " X, y = datasets.make_circles(noise=0.2, factor=0.5, random_state=1)\n", " elif dataset == 'moons':\n", " X, y = datasets.make_moons(noise=0.3, random_state=0)\n", " X = MinMaxScaler().fit_transform(X)\n", " \n", " # Train Classifiers.\n", " classifiers = []\n", " for label, param in zip(labels, params):\n", " classifier = MLPClassifier(verbose=0, \n", " random_state=0,\n", " max_iter=max_iter, **param)\n", " classifier.fit(X, y)\n", " classifiers.append(classifier)\n", " for classifier, label, args in zip(classifiers, labels, plot_args):\n", " plt.plot(classifier.loss_curve_, label=label, **args)\n", " \n", " plt.legend(ncol=2, loc=\"best\")\n", " plt.xlabel('iterations')\n", " plt.ylabel('Error')\n", "\n", "interact(plot_on_dataset, dataset=['iris', 'digits', 'circles', 'moons']);" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Universal function Aproximator" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "from sklearn import svm\n", "\n", "def laplacian_kernel(X, Y, bw):\n", " rows = X.shape[0]\n", " cols = Y.shape[0]\n", " K = np.zeros((rows, cols))\n", " for col in range(cols):\n", " dist = bw * np.linalg.norm(X - Y[col, :], ord=1, axis=1)\n", " K[:, col] = np.exp(-dist)\n", " return K\n", "\n", "def process_regressor(regressor, xtrain, ytrain, xplot, yplot):\n", " regressor.fit(np.reshape(xtrain, (xtrain.size, 1)), ytrain)\n", "\n", " yhat = regressor.predict(np.reshape(xplot, (xplot.size, 1)))\n", "\n", "\n", " plt.scatter(xtrain, ytrain, label=\"Training data\", alpha=0.2)\n", " plt.plot(xplot, yplot, 'r-', label=\"True Function\")\n", " plt.plot(xplot, yhat, 'g-', label=\"Prediction\")\n", "\n", " plt.legend(loc='lower center');\n", " plt.ylim([np.min(yplot)*1.1, np.max(yplot)*1.1])\n", "\n", "def NNregressor(activation, solver, hidden_layer_size, reg, xtrain, ytrain, xplot, yplot):\n", " regressor = MLPRegressor(activation=activation,\n", " solver=solver,\n", " alpha=reg,\n", " random_state=0,\n", " hidden_layer_sizes=hidden_layer_size,\n", " tol=1e-6,\n", " max_iter=1000\n", " )\n", " process_regressor(regressor, xtrain, ytrain, xplot, yplot)\n", "\n", "def SVMregressor(kernel, bw, reg, xtrain, ytrain, xplot, yplot):\n", " if kernel == 'rbf':\n", " gamma = np.power(10., -bw)\n", " coef0 = 0\n", " elif kernel == 'laplacian':\n", " gamma = np.power(10., -bw)\n", " coef0 = 0\n", " kernel = lambda X, Y: laplacian_kernel(X, Y, gamma)\n", " \n", " regressor = svm.SVR(kernel=kernel, C=1./reg, gamma=gamma,coef0=coef0)\n", " process_regressor(regressor, xtrain, ytrain, xplot, yplot)\n", "\n", " \n", "def uat_demo(function, n_samples, noise, family):\n", " if function == 1:\n", " f = lambda x: np.sin(x) \n", " elif function == 2:\n", " f = lambda x: np.sin(x) * np.exp(np.abs(x))\n", " elif function == 3:\n", " f = lambda x: np.sin(x) * np.floor(np.abs(x))\n", " elif function == 4:\n", " f = lambda x: np.sin(x * np.floor(np.abs(x)))\n", "\n", " xmin = -6\n", " xmax = +6\n", " xplot = np.arange(xmin, xmax, 0.01)\n", " yplot = f(xplot)\n", "\n", " xtrain = xmin + (xmax -xmin) * np.random.rand(n_samples)\n", " ytrain = f(xtrain) + noise * np.random.randn(xtrain.size)\n", " \n", " if family == 'NN':\n", " regressor = interact(\n", " NNregressor,\n", " solver=['lbfgs', 'sgd', 'adam'],\n", " activation=['relu', 'identity', 'logistic'],\n", " hidden_layer_size=[(1,), (5, ), (50, ), (100, ), (1000, ),\n", " (5, 5, ), (50, 50, ), (100, 100), \n", " (50, 50, 50), (100, 100, 100)],\n", " reg=[0, 10**-3, 10**-2, 10**-1, 1], \n", " xtrain=fixed(xtrain), \n", " ytrain=fixed(ytrain), \n", " xplot=fixed(xplot), \n", " yplot=fixed(yplot))\n", " \n", " elif family == 'SVM':\n", " regressor = interact(\n", " SVMregressor,\n", " kernel=['rbf', 'laplacian'],\n", " bw=ipywidgets.FloatSlider(value=-1,\n", " min=-3,\n", " max=3,\n", " step=0.1,\n", " readout_format='.1f',\n", " description='Bandwidth 10^:',\n", " style={'description_width': 'initial'},\n", " continuous_update=False),\n", " reg=[10**-3, 10**-2, 10**-1, 1], \n", " xtrain=fixed(xtrain), \n", " ytrain=fixed(ytrain), \n", " xplot=fixed(xplot), \n", " yplot=fixed(yplot))\n", "\n", "interact(uat_demo, \n", " n_samples=[100, 200, 500, 1000, 10000],\n", " noise=[0, 0.01, 0.05, 0.1, 0.5,],\n", " function=ipywidgets.ToggleButtons(value=1, \n", " options=[1, 2, 3, 4], \n", " description='Function:',\n", " style={'description_width': 'initial'}),\n", " family=['NN', 'SVM']\n", " );" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 2 }